Segmenting Human Hairs and Faces

ABSTRACT

Systems for segmenting human hairs and faces in color images are disclosed, with methods and processes for making and using the same. The image may be cropped around the face area and roughly centered. Optionally, the illumination environment of the input image may be determined. If the image is taken under dark environment or the contrast between the face and hair regions and background is low, an extra image enhancement may be applied. Sub-processes for identifying the pose angle and chin contours may be performed. A preliminary mask for the face by using multiple cues, such as skin color, pose angle, face shape and contour information can be represented. An initial hair mask by using the abovementioned multiple cues plus texture and hair shape information may be created. The preliminary face and hair masks are globally refined using multiple techniques.

CROSS REFERENCE TO RELATED APPLICATION

This application claims priority to U.S. Provisional Patent ApplicationNo. 61/341,707, filed Apr. 5, 2010, which is hereby incorporated byreference herein in its entirety.

BACKGROUND OF THE INVENTION

Images have been utilized to capture precious moments since the adventof the photograph. With the emergence of the digital camera, anunimaginable number of photographs are captured every day. Certainprecious moments have significant value to a particular person or groupof people, such that photographs of a precious moment are often selectedfor a personalized presentation. For example, greeting card makers nowallow users to edit, configure, or otherwise personalize their offeredgreeting cards, and a user will likely put in a photograph of choice toadd their personal touch to a greeting card. Items that may be used forcreating a personalized presentation abound, such as t-shirts, mugs,cups, hats, mouse-pads, other print-on-demand items, and other giftitems and merchandise. Personalized presentations may also be createdfor sharing or viewing on certain devices, uploading to an online oroffline location, or otherwise utilizing computer systems. For example,personalized presentations may be viewed on desktop computers, laptopcomputers, tablet user devices, smart phones, or the like, throughonline albums, greeting card websites, social networks, offline albums,or photo sharing websites.

Many applications exist for allowing a user to provide context to aphotograph for providing a humorous, serious, sentimental, or otherwisepersonal message. Online photo galleries allow their customers to ordersuch merchandises by selecting pictures from their albums. Kiosks areavailable at big retail stores all around the world to address similarneeds. However, there is no approach available to the general populationfor creating a personalized presentation that allows for placing aperson's head within another image or graphic. Indeed, the placement ofa person's head from one image to another, though possessing the abilityto create a multiple of applications, is a daunting task for the regularconsumer and often, until now, is best handled by an expert imageprocessor.

As should be apparent, there are needs for solutions that provide userswith easier and or automated or semi-automated abilities for creatingcontextually personalized presentations of their images which allow forthe segmentation of a human head from one image to be utilized for otherapplications of choice, including creating personalized presentations.

SUMMARY

The following presents a simplified summary of the disclosure in orderto provide a basic understanding to the reader. This summary is not anextensive overview of the disclosure and it does not identify key orcritical elements of the embodiments disclosed nor delineate the scopeof the disclosed embodiments. Its sole purpose is to present someconcepts disclosed herein in a simplified form as a prelude to the moredetailed description that is presented later.

Segmenting or distinguishing salient regions within an image is adifficult task. Disclosed embodiments provide for creation of easier,semi-automated, and automated abilities for segmenting or distinguishinghuman heads from an image. In one embodiment, a head or heads to bedistinguished are either provided, selected, located, or otherwiseidentified. This The elements to distinguish may all be processedtogether or may be done separately. In one alternative embodiment, datafor the face of the head is gathered or acquired. Some or all of thisdata is then processed to distinguish the face of the head from theother elements of the image. In another alternative embodiment, data forthe hair of the head is gathered or acquired. Some or all of this datais then processed to distinguish the hair from the other elements of theimage. In an additional embodiment, the data for both elements face andhair of the head is gathered or acquired and this data is processed todistinguish the face and hair, individually or together, from the restof the image. In an alternative additional embodiment, elements to bedistinguished are selectively chosen to be distinguished and are done soindividually or together. In images with more than one head, a singlehead to be distinguished may be chosen or more than one head may bechosen.

The elements that are distinguished may be represented in any number ofprocesses, including creating an image mask. A head that has beendistinguished from one image may be placed into another image orgraphic. The ability to create a unique personalized message or imagemay be utilized to attract users to physical locations or electronicallyavailable locations, such as websites and web-forums. The attraction ofusers may be associated with the ability to sell advertisement space orprovide an advertising campaign, either specifically related to thecreation of personalized messages or images, or generally otherwise.

In another alternative embodiment, the image may be cropped around theface area and roughly centered. Optionally, the illumination environmentof the input image may be determined. If the image is taken under darkenvironment or the contrast between the face and hair regions andbackground is low, an extra image enhancement may be applied.Sub-processes for identifying the pose angle and chin contours may beperformed. A preliminary mask for the face by using multiple cues, suchas skin color, pose angle, face shape and contour information can berepresented. An initial hair mask by using the abovementioned multiplecues plus texture and hair shape information may be created. Thepreliminary face and hair masks may be globally refined using multipletechniques.

BRIEF DESCRIPTION OF DRAWINGS

The accompanying drawings, which are included as part of the presentspecification, illustrate the presently preferred embodiments andtogether with the general description given above and the detaileddescription of the preferred embodiments given below serve to explainand teach the principles of the disclosed embodiments.

FIG. 1 is a diagrammatic illustration of a system, process or method fordistinguishing elements representing a human head within an image,according to one embodiment.

FIG. 2 is a diagrammatic illustration of a system, process or method fordistinguishing elements representing a human head within an image,according to another embodiment.

FIG. 3 is a sample color left image presented in gray scale utilized toillustrate the processes and sub-processes of the exemplary embodimentsdisclosed herein.

FIG. 4 is the sample image of FIG. 3 slightly zoomed in at the head andproviding detection results from face and other elements detectors.

FIG. 5 is the sample image of FIG. 3 slightly zoomed in at the head andproviding detection results by the ASM process.

FIG. 6 is a diagrammatic illustration of a system, method or process forgenerating an initial hair mask according to one embodiment.

FIG. 7 is a sample average hair mask displayed as a binary image fromthe frontal pose angle.

FIG. 8 is a sample average hair mask displayed as a binary image fromthe pose angle of a left yaw rotation.

FIG. 9 is a sample average hair mask displayed as a binary image fromthe pose angle of aright yaw rotation.

FIG. 10 is a sample average long hair mask displayed as a binary imagefrom the frontal pose angle.

FIG. 11 is a sample average long hair mask displayed as a binary imagefrom the pose angle of a left yaw rotation.

FIG. 12 is a sample average long hair mask displayed as a binary imagefrom the pose angle of a right yaw rotation.

FIG. 13 is a diagrammatic illustration of a system, method, or processfor additionally refining the face and hair masks according to oneembodiment.

FIG. 14 is an illustration of an exemplary embodiment of an architecture1000 of a computer system suitable for executing the methods disclosedherein.

It should be noted that the figures are not drawn to scale and thatelements of similar structures or functions are generally represented bylike reference numerals for illustrative purposes throughout thefigures. It also should be noted that the figures are only intended tofacilitate the description of the preferred embodiments of the presentdisclosure. The figures do not illustrate every aspect of the disclosedembodiments and do not limit the scope of the disclosure.

DETAILED DESCRIPTION

Systems for segmenting human hairs and faces in color images aredisclosed, with methods and processes for making and using the same.

In the following description, for purposes of explanation, specificnomenclature is set forth to provide a thorough understanding of thevarious inventive concepts disclosed herein. However it will be apparentto one skilled in the art that these specific details are not requiredin order to practice the various inventive concepts disclosed herein.

Some portions of the detailed description that follow are presented interms of processes and symbolic representations of operations on databits within a computer memory. These process descriptions andrepresentations are the means used by those skilled in the dataprocessing arts to most effectively convey the substance of their workto others skilled in the art. A process is here, and generally,conceived to be a self-consistent sequence of sub-processes leading to adesired result. These sub-processes are those requiring physicalmanipulations of physical quantities. Usually, though not necessarily,these quantities take the form of electrical or magnetic signals capableof being stored, transferred, combined, compared, and otherwisemanipulated. It has proven convenient at times, principally for reasonsof common usage, to refer to these signals as bits, values, elements,symbols, characters, terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar termsare to be associated with the appropriate physical quantities and aremerely convenient labels applied to these quantities. Unlessspecifically stated otherwise as apparent from the following discussion,it is appreciated that throughout the description, discussions utilizingterms such as “processing” or “computing” or “calculating” or“determining” or “displaying” or “locating” or “finding” or“reconciling”, or “identifying”, or the like, may refer to the actionand processes of a computer system, or similar electronic computingdevice, that manipulates and transforms data represented as physical(electronic) quantities within the computer system's registers andmemories into other data similarly represented as physical quantitieswithin the computer system's memories or registers or other suchinformation storage, transmission, or display devices.

The disclosed embodiments also relate to an apparatus for performing theoperations herein. This apparatus may be specially constructed for therequired purposes, or it may comprise a general-purpose computerselectively activated or reconfigured by a computer program stored inthe computer. Such a computer program may be stored in a computerreadable storage medium, such as, but not limited to, any type of disk,including floppy disks, optical disks, CD-ROMS, and magnetic-opticaldisks, read-only memories (“ROMs”), random access memories (“RAMs”),EPROMs, EEPROMs, magnetic or optical cards, or any type of mediasuitable for storing electronic instructions, and each coupled to acomputer system bus.

The processes and displays presented herein are not inherently relatedto any particular computer or other apparatus. Various general-purposesystems may be used with programs in accordance with the teachingsherein, or it may prove convenient to construct more specializedapparatus to perform the required method sub-processes. The requiredstructure for a variety of these systems will appear from thedescription below. In addition, the disclosed embodiments are notdescribed with reference to any particular programming language. It willbe appreciated that a variety of programming languages may be used toimplement the teachings of the disclosed embodiments.

In some embodiments an image is a bitmapped or pixmapped image. As usedherein, a bitmap or pixmap is a type of memory organization or imagefile format used to store digital images. A bitmap is a map of bits, aspatially mapped array of bits. Bitmaps and pixmaps refer to the similarconcept of a spatially mapped array of pixels. Raster images in generalmay be referred to as bitmaps or pixmaps. In some embodiments, the termbitmap implies one bit per pixel, while a pixmap is used for images withmultiple bits per pixel. One example of a bitmap is a specific formatused in Windows that is usually named with the file extension of .BMP(or .DIB for device-independent bitmap). Besides BMP, other file formatsthat store literal bitmaps include InterLeaved Bitmap (ILBM), PortableBitmap (PBM), X Bitmap (XBM), and Wireless Application Protocol Bitmap(WBMP). In addition to such uncompressed formats, as used herein, theterm bitmap and pixmap refers to compressed formats. Examples of suchbitmap formats include, but are not limited to, formats, such as JPEG,TIFF, PNG, and GIF, to name just a few, in which the bitmap image (asopposed to vector images) is stored in a compressed format. JPEG isusually lossy compression. TIFF is usually either uncompressed, orlosslessly Lempel-Ziv-Welch compressed like GIF. PNG uses deflatelossless compression, another Lempel-Ziv variant. More disclosure onbitmap images is found in Foley, 1995, Computer Graphics Principles andPractice, Addison-Wesley Professional, p. 13, ISBN 0201848406 as well asPachghare, 2005, Comprehensive Computer Graphics: Including C++, LaxmiPublications, p. 93, ISBN 8170081858, each of which is herebyincorporated by reference herein in its entirety.

In typical uncompressed bitmaps, image pixels are generally stored witha color depth of 1, 4, 8, 16, 24, 32, 48, or 64 bits per pixel. Pixelsof 8 bits and fewer can represent either grayscale or indexed color. Analpha channel, for transparency, may be stored in a separate bitmap,where it is similar to a greyscale bitmap, or in a fourth channel that,for example, converts 24-bit images to 32 bits per pixel. The bitsrepresenting the bitmap pixels may be packed or unpacked (spaced out tobyte or word boundaries), depending on the format. Depending on thecolor depth, a pixel in the picture will occupy at least n/8 bytes,where n is the bit depth since 1 byte equals 8 bits. For anuncompressed, packed within rows, bitmap, such as is stored in MicrosoftDIB or BMP file format, or in uncompressed TIFF format, the approximatesize for a n-bit-per-pixel (2n colors) bitmap, in bytes, can becalculated as: size≈width×height×n/8, where height and width are givenin pixels. In this formula, header size and color palette size, if any,are not included. Due to effects of row padding to align each row startto a storage unit boundary such as a word, additional bytes may beneeded.

In computer vision, segmentation refers to the process of partitioning adigital image into multiple regions (sets of pixels). The goal ofsegmentation is to simplify and/or change the representation of an imageinto something that is more meaningful and easier to analyze. Imagesegmentation is typically used to locate objects and boundaries (lines,curves, etc.) in images.

The result of image segmentation is a set of regions that collectivelycover the entire image, or a set of contours extracted from the image.Each of the pixels in a region share a similar characteristic orcomputed property, such as color, intensity, or texture. Adjacentregions are significantly different with respect to the samecharacteristic(s).

Several general-purpose algorithms and techniques have been developedfor image segmentation. Exemplary segmentation techniques are disclosedin The Image Processing Handbook, Fourth Edition, 2002, CRC Press LLC,Boca Raton, Fla., Chapter 6, which is hereby incorporated by referenceherein for such purpose. Since there is no general solution to the imagesegmentation problem, these techniques often have to be combined withdomain knowledge in order to effectively solve an image segmentationproblem for a problem domain.

Some embodiments disclosed below create a mask, often stored as an alphachannel. In computer graphics, when a given image or portion of an image(or figure) is intended to be placed over another image (or background),the transparent areas can be specified through a binary mask. For eachintended composite image there are three bitmaps: the image containingthe figure, the background image and an additional mask, in which thefigure areas are given a pixel value of all bits set to 1's and thesurrounding areas a value of all bits set to 0's. The mask may benonbinary when blending occurs between the figure and its surroundings.

To put the figure image over the background, the processes may firstmask out the ground pixels in the figure image with the binary mask bytaking a pixel by pixel product of the two bitmaps. This preserves thefigure pixels. Another product is performed between the inverse of thebinary mask and the background, removing the area where the figure willbe placed. Then, the processes may render the final image pixels byadding the two product results. This way, the figure pixels areappropriately placed while preserving the background. The result is acompound of the figure over the background. Other blending techniquesmay be used to blend the figure with the new background, such assmoothing at the figure mask boundary.

Figure mask may be produced by segmenting the figure region from thebackground. In computer vision, segmentation refers to the process ofpartitioning a digital image into multiple regions. The pixels in aregion share similar characteristics or computed properties. They may besimilar in color and intensity, or be part of a larger texture orobject. Adjacent regions are significantly different with respect to thesame characteristic(s). Masks representing the different elements of thehead may comprise of many layers, where each layer represents ameaningful region of pixels, such as face, hair, sunglasses, hat, and soon.

Throughout the present description of the disclosed embodimentsdescribed herein, all steps or tasks will be described using one or moreembodiments. However, it will be apparent to one skilled in the art,that the order of the steps described could change in certain areas, andthat the embodiments are used for illustrative purposes and for thepurpose of providing understanding of the inventive properties of thedisclosed embodiments.

FIG. 1 is a diagrammatic illustration of a system, process or method fordistinguishing elements representing a human head within an image,according to one embodiment. In this embodiment, at 100 a head or headsto be distinguished are either provided, selected, located, or otherwiseidentified. This may be done in any number of operations, including byan autonomous process, semi-autonomous process, or manually. Theelements to distinguish may all be processed together or may be doneseparately. In one alternative embodiment, data for the face of the headat 101 is gathered or acquired. Some or all of this data is thenprocessed at 102 to distinguish the face of the head from the otherelements of the image. In another alternative embodiment, data for thehair of the head at 103 is gathered or acquired. Some or all of thisdata is then processed at 104 to distinguish the hair from the otherelements of the image. In an additional embodiment, the data for bothelements face and hair of the head at 100 is gathered or acquired andthis data is processed to distinguish the face and hair, individually ortogether, from the rest of the image. In an alternative additionalembodiment, elements to be distinguished are selectively chosen to bedistinguished and are done so individually or together. In images withmore than one head, a single head to be distinguished may be chosen ormore than one head may be chosen.

The elements that are distinguished may be represented in any number ofprocesses, including creating an image mask. The elements distinguishedmay be utilized to create creative, humorous, message-delivering, orotherwise personalized presentations. A head that has been distinguishedfrom one image may be placed into another image or graphic. For example,the image mask of the head may be placed into an electronically createdgreeting card, or onto images with celebrities, or otherwise placed intoimages or graphics with the intention of representing a person orpersons heads within the images or graphics. As should be apparent, theutilization of the distinguished head may be utilized in a multiple ofways to create a personalized message or image. The ability to create aunique personalized message or image may be utilized to attract users tothe creation of purchasable items, such as photos, t-shirts, coffeemugs, mouse-pads, greeting cards, and the like. The ability to create aunique personalized message or image may be utilized to attract users tophysical locations or electronically available locations, such aswebsites and web-forums. The attraction of users may be associated withthe ability to sell advertisement space or provide an advertisingcampaign, either specifically related to the creation of personalizedmessages or images, or generally otherwise.

FIG. 2 is a diagrammatic illustration of a system, process or method fordistinguishing elements representing a human head within an image,according to another embodiment. In this embodiment, face or faces aredetected at 201. FIG. 3 is a sample color left image presented in grayscale utilized to illustrate the processes and sub-processes of theexemplary embodiments disclosed herein. The face detection operation maycomprise any method of detecting faces, including utilizing a Haar-basedAda-Boost Classifier as disclosed in “Probabilistic Methods For FindingPeople”, S. Ioffe, D. A. Forsyth, International Journal of ComputerVision, vol. 43, issue 1, pp. 45-68, 2001, and “Robust Real-time ObjectDetection,” P. Viola, M. Jones, International Journal of ComputerVision, 2001, each of which are hereby incorporated in their entiretyfor this purpose. The face detector may be a hierarchical rule-basedsystem where the rules were created based on performance on a set oftraining images. In an alternative embodiment, false alarms are removedby voting from detection results and an independent skin detector, suchas disclosed in “A Comparative Assessment Of Pixel-Based Skin DetectionMethods”, V. Vezhnevets, A. Andreeva, Technical Report, Graphics andMedia Laboratory, 2005, which is hereby incorporated by reference in itsentirety for this purpose.

In another alternative embodiment, the location and rotation angle ofeyes, nose, and mouth are also determined by Haar-based Ada-BoostClassifiers. The location of eyes can help to decide the pose angle, andall of facial components are useful for finding the face contour. FIG. 4is the sample image of FIG. 3 slightly zoomed in at the head andproviding detection results from face and other elements detectors. InFIG. 4, the region 401 represents the detected face area, the region 402represents the eyes area, the region 403 represents the nose area, andthe region 404 represents the mouth area. After the detection process, arectangle window around each face region will be cropped for furtherprocess. In an additional embodiment, the image is cropped to the facearea for further process.

Some images may optionally be processed to enhance image quality. Forexample, photos taken under a dark scene, low contrast background, andoutdoor environment with strong directional sun lights may be selectedfor enhancement.

An image taken under a dark background may be detected by checking ifthe average intensity level of the luminance channel in the image islarger than a chosen or set threshold. The contrast value of the imagemay be computed by the average range of the intensity distribution ofthe luminance channel near the face area. An image with low contrastbackground can be found by setting a threshold of the contrast value. Ifthe input image is detected under the situation of dark background orlow contrast environment, one process that may be initially applied tothe image is disclosed in “Contrast Limited Adaptive HistogramEqualization,” Karel Zuiderveld, Graphics Gems IV, pp. 474-485, which ishereby incorporated by reference in its entirety for this purpose. Allof the parameter setting may be determined by cross-validation.

Images taken in the outdoor environment sometimes contain directionalsun light, which may cause specular and shadow effects on the faceareas. All of color channels of skin pixels under specular lightingeffects usually become quite bright, and therefore the color-based skindetector may fail to detect those skin pixels. The idea of illuminationcorrection is based on finding some new color space that is insensitiveto specular light, as disclosed in “Beyond Lambert: ReconstructingSpecular Surfaces Using Color,” S. P. Mallick, T. E. Zickler, D. J.Kriegman, P. N. Bellhumeur, Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition, 2005, which is herebyincorporated by reference in its entirety for this purpose.

The following is a system, method or process for enhancing an imagebased upon correcting outdoor environmental effects, according to oneembodiment. Assuming that the sun light can be represented by a whitecolor vector, i.e. [1,1,1] in the RGB color space, a transformation maybe found that rotates one color channel to be parallel to the whitecolor vector. The rest of the two rotated color channels will becomeindependent to the white color vector, and therefore the color spaceformed by these new channels is insensitive to specular illuminationfrom the sun. Denote these two rotated color channels as I_(p), I_(q).The new image formed by I_(d)=√{square root over (I_(p) ²+I_(q) ²)},mostly depends on diffuse reflectance. The final output of theillumination correction is the linear blending of the diffused image andeach RGB channel of the original image in the following equation:I_(R,G,B)=αI_(d)+(1−α)I_(R,G,B), where α is the model parameter and maybe set to 0.7.

In the embodiment disclosed in FIG. 2, the chin contour is identified at202. The chin contour usually contains a rigid shape with less variationamong different people, and high contrast from the background. TheActive Shape Model (hereinafter “ASM”) may be utilized to detect thechin contour. The shape of the chin is described by a set of controlpoints. For each input face image, the ASM process tries to match thecontrol points to the chin contour by an iterative approach. First, itsearches a better position for each control point around the currentposition based on the intensity profile. Then the internal modelparameters are updated in order to have the best match to these newfound positions. The chin contour is a robust feature for initializingthe face location and separating the chin and neck areas in the finalstage of the segmentation process according to one embodiment. Someimplementations of the ASM can find the contours of two eyes, nose,mouth, and chin. FIG. 5 is the sample image of FIG. 3 slightly zoomed inat the head and providing detection results by the ASM process. In FIG.5, the region 401 represents the detected face area from FIG. 4, 501references the detected chin contour, the region 502 represents detectedcontours by the eyes, nose and mouth.

At 203 of FIG. 2, the pose angle is optionally identified. The threedimensional (or “3D”) pose can be parameterized by yaw (left-and-right)angle and pitch (up-and-down) angle. The yaw angle is estimated based ona non-linear function ƒ_(yaw), which takes two parameters, the distancebetween the eye location and the center of the face, w_(eye), and thewidth of the face, w_(face). The eye location is returned by parts orelements detector described above. A skin color detector is utilized tofind the rough location of the face, and then the center of the face andthe width of the face are computed. The non-linear function to estimateyaw angle may be based on the following formula:

$f_{yaw} = {{\arcsin \left( \frac{w_{eye}}{w_{face}} \right)}.}$

At 204 of FIG. 2, an initial face mask is generated. The initial facemask may be found based upon shape models and/or color models. Forexample, a shape model may utilize the ASM control points to suggest therough face area from the face box detected by the face detector. In thetraining phase, an average ASM shape model may be trained based on a setof training images. Then all the training images may be aligned by theaverage ASM model, and finally a generic face mask is generated byaveraging all the registered target face masks in the training images.The meaning of each pixel in the generic face mask represents thelikelihood probability of that pixel to be inside the face area. In thetesting phase, this generic face mask is registered by ASM controlpoints with the face found in the test image, and then the overlappedface area with high likelihood probability will be selected as theinitial face mask. In addition, the location of the face box may beutilized for another initial estimation of the face mask from the facedetector. The area inside the face box could be applied to crop the facemask generated from the probabilistic template.

An example color model is a 2-D Gaussian parametric model, and trainedby skin and non-skin pixels indentified manually from a training dataset. The description of a parametric color model is disclosed in“Systems and Methods for Rule-based Segmentation for Vertical Person orPeople With Full or Partial Frontal View in Color Images,” U.S. patentapplication Ser. No. 12/735,093, filed on Jun. 4, 2010, and “System andMethod for Segmenting an Image for a Person to Obtain a Mugshot,” U.S.patent application Ser. No. 12/603,093, filed on Oct. 21, 2009, each ofwhich are assigned to the assignee of the current application, and eachof which are hereby incorporated by reference in their entirety. Thecolor model helps to identify the skin pixels in the test image. Next aset of color pixels are sampled from the refined face mask, and used tore-calculated the 2-D Gaussian parametric model. The refined color modelthen is computed based on the statistics of the skin color intensity inthe test image, and therefore the local variation of the skin color canbe well captured by the re-calculated model. Finally, the results fromthe color model are combined with the initial face mask found by theshape model jointly to form a refined face mask.

In some cases, when the skin or shape model fail to capture the initialface mask the results from one of the unstable models will be discarded,and only the results from the other reliable model will be kept as thefinal result. The face skin mask may also used to adjust the location ofthe initial face box returned by the face detector.

At 205 of FIG. 2, an initial hair mask is generated. FIG. 6 is adiagrammatic illustration of a system, method or process for generatingan initial hair mask according to one embodiment. At 601 of FIG. 6, aninitial hair mask by average hair template is registered. In someembodiments, most reliable region of the hair appearance is the locationnear the boundary of the top face area, except if the detected personhas a bald head. In order to identify the hair near the top of the head,the location, width, and height of the forehead are first estimatedbased on the skin face map and the position of the eyes returned by theelements or parts detector described above. An average hair mask aroundtop head may be registered based on the center of the forehead, and thescale may be estimated based on the width and height of the forehead.FIG. 7 is a sample average hair mask displayed as a binary image fromthe frontal pose angle. FIG. 8 is a sample average hair mask displayedas a binary image from the pose angle of a left yaw rotation. FIG. 9 isa sample average hair mask displayed as a binary image from the poseangle of a right yaw rotation.

At 602 of FIG. 6, JigCut regions are computed. In one embodiment, therepresentation utilized to detect the initial hair region is calledJigCut regions. A JigCut region is sometimes called super-pixel, a smallregion where the color variation is small. The detailed description ofJigCut regions is disclosed in “System and Method for Segmentation of anImage into Tned Multi-Scaled Regions,” U.S. patent application Ser. No.12/502,125, filed on Jul. 13, 2009, which is assigned to the assignee ofthe current application and is hereby incorporated by reference in itsentirety. Other processes for creating JigCut regions include thosedisclosed in “Watersheds in Digital Spaces: An Efficient Algorithm Basedon Immersion Simulations,” Vincent, Luc, and Pierre Soille, IEEETransactions of Pattern Analysis and Machine Intelligence, Vol. 13, No.6, June 1991, pp. 583-598, and “Mean Shift: A Robust Approach TowardFeature Space Analysis,” D. Comanicu, P. Meer, IEEE Trans. Pattern Anal.Machine Intell., 24, 603-619, May 2002, which are each incorporated byreference in their entireties for this purpose.

At 603 of FIG. 6, a determination is processed for whether the imagecomprises low contrast and/or a dark background. An image that compriseslow contrast or a dark background may needs further process in order toimprove the final segmentation result. An image with low contrast may bedetected by checking if the variance of the distribution of theluminance channel is larger than a threshold value. An image with a darkbackground may be detected by counting the number of dark pixels. If theimage contains large a low contrast area or a dark background, thestandard process of adaptive histogram equalization at 604 as describedabove may be applied to enhance the contrast of the image. In anadditional embodiment, JigCut regions in dark background or low contrastareas may have a large size and contain both head and background areas.Therefore, large JigCut regions around the face area may be furtherdecomposed into smaller pieces based on the initial hair mask.

At 605 of FIG. 6, hair color is determined by JigCut voting. The processof initial hair detection tries to classify each JigCut region insidethe initial hair mask by using a mixture of experts. In one embodiment,three different experts are implemented to detect the three differenthair colors black, blonde, and grey based on color information. Asshould be apparent, any number of experts may be added or removed basedupon the colors chosen to be detected.

The following formula may be utilized to define a black hair expert:

BlackHair_(Expert)=BlackHair_(Expert1)

BlackHair_(Expert2)

BlackHair_(Expert1): Thresholding based on Gaussian likelihood ofluminance channel in YC_(b)C_(r) color space.BlackHair_(Expert2): Thresholding based on the distance to the linebetween (0,0,0) and (1,1,1) in the normalized RGB color space.

The following formula may be utilized to define a blonde hair expert:

Gaussian likelihood of C_(b)C_(r) channels in YC_(b)C_(r) color space.

The following formula may be utilized to define a grey hair expert:

GreyHair_(Expert)=GreyHair_(Expert1)

GreyHair_(Expert2)

GreyHair_(Expert1): If average luminance channel in YC_(b)C_(r) colorspace larger than a threshold.GreyHair_(Expert2): Thresholding based on the distance to the linebetween (0,0,0) and (1,1,1) in the normalized RGB color space.

After the classification of hair color in each JigCut region, the finalhair color may be determined by the voting results of all JigCut regionsinside the registered hair mask. A set of pixels in the detected JigCutregions with the major hair color may be selected to retrain theGaussian mixture model or models. The new Gaussian mixture model maythen be used to find more JigCut region of hair pixels, which may beutilized to update the initial hair mask.

Optionally, at 606 of FIG. 6, the hair and/or face masks may be updatedby a boosting tree classifier. For example, blonde hair associated withhuman skin may have similar color distribution in the RGB space, whichmight make it difficult for the Gaussian model to clearly distinguishpixels between the blonde hair and face skin area. To assist in thedistinguishing operation, a two-class classifier implemented based onboosted decision trees utilizing the Adaboost process, as disclosed in“Logistic regression, Adaboost, and Bregman distances,” M. Collins, R.Schapire, and Y. Singer, Machine Learning, 48(1-3), 2002, which ishereby incorporated by reference in its entirety. This process allowsfor taking features of higher order image statistics, such as gradientand texture features, such as the operation disclosed in “Histograms ofOriented Gradients for Human Detection,” N Dalal, B. Triggs, Proceedingsof the IEEE Conference on Computer Vision and Pattern Recognition, 2008,which is hereby incorporated in its entirety by reference, to betterdistinguish face and hair regions.

At 607 of FIG. 6, the hair contour fitting process is performed. Due tothe variation of hair color, as well as yaw and pitch angles, the hairregion may not be found in all instances during the process of theinitial hair detection. Hair contour fitting may be able to detect thehair region since the top of the hair contour usually can be wellapproximated by a parametric curve of round convex shape. In oneembodiment, features used to perform hair contour fitting are edgesfound by a Canny edge detector and the direction of the tangent vectorof image gradient. The hair contour may be defined by the followingpolynomial:

$y = {y_{0} + {h \cdot {\frac{2 \cdot \left( {x - x_{0}} \right)}{w}}^{3}}}$

The fitting operative process tries to perform an exhaustive search inthe parametric space (x₀,y₀,w,h) based on the face size and location, aswell as yaw angle information. During each search iteration, thedistance d_(i) of each edge pixel (x_(i),y_(i)) in the input image tothe nearest point on the curve defined by (x₀,y₀,w,h) is computed, aswell as the angular difference θ_(i) of the direction of the tangentvectors between those two points. The objective function of theexhaustive search process may be to find the curve associating with themaximum number of the edge points where both Euclidean distance and theangular difference to the nearest points are smaller than some thresholdin the following:

${C_{opt} = {\underset{C{({x_{0},y_{0},w,h})}}{argmax}{\sum\limits_{i}{\delta \left( {{{{d_{i} < t_{1}}\&}\mspace{14mu} \theta_{i}} < t_{2}} \right)}}}},$

where t₁ and t₂, are two thresholds, andδ(d_(i)<t₁ & θ_(i)<t₂)=1 if d_(i)<t₁ & θ_(i)<t₂ is true.

After the optimal curve C_(opt) is found, pixels inside the curve may beadded into the hair mask, and pixels outside the curve will be removedfrom hair mask to background. The refined results may be further refinedby utilizing a random walk operation as disclosed below.

At 608 of FIG. 6, determination of whether the head comprises long hairis performed. If the head does comprise long hair, a long hairrefinement process may optionally be performed at 609. This operationmay lead to a more accurate hair color model, utilizing the hair locatedon top of the head. A different average hair template may be registeredwhich is based upon a proper long hair template mask, optionally furtherbased on the yaw angle. The operation for registering a hair template isdescribed above. Each pixel inside the mask that currently does notappear in the hair or face mask is selected and then classified again bythe refined hair and background Gaussian mixture models in the RGB colorspace. Optionally, the resulting hair mask may further refined by arandom walk operation as disclosed below. FIG. 10 is a sample averagelong hair mask displayed as a binary image from the frontal pose angle.FIG. 11 is a sample average long hair mask displayed as a binary imagefrom the pose angle of a left yaw rotation. FIG. 12 is a sample averagelong hair mask displayed as a binary image from the pose angle of aright yaw rotation. Refined hair and face masks are outputted at 610.

Optionally, at 206 of FIG. 2, additional refinement of the face and hairmasks may be performed. FIG. 13 is a diagrammatic illustration of asystem, method, or process for additionally refining the face and hairmasks according to one embodiment. As should be apparent, allsub-processes within this embodiment are optional and may be chosenbased upon the type of refinement elected.

At 800 of FIG. 13, a least square mask fitting operation is performed.The least square mask fitting operation may be elected to refine thehead mask that includes both initial results of face and hair regionsbased on linear combination of channels. The channel set may be computedin following sub-processes. First a set of feature vectors are extractedfrom each pixel in the input image. Those features may include edgestrength map, hue, luminance, and derivative of Gaussian. The standardk-means algorithm may be applied to cluster those feature vectors, withthe cluster centroids referred to as textons. The channel set may beconstructed based on the fuzzy membership of textons. The optimal headmask and background are assumed to be represented as a linearcombination of the channel set. Those weights to linearly combine thechannel set can be determined by the Least Square Approach, which isdisclosed in “System and Method for Segmenting an Image for a Person toObtain a Mugshot,” U.S. patent application Ser. No. 12/603,093, filed onOct. 21, 2009, which is assigned to the assignee of the currentapplication, and is hereby incorporated by reference in their entirety.

When there are regions in the background with color similar to skin orhair, such as wooden furniture or sand, it is possible to lose all skinor hair colored pixels after least square fitting process. The face skinmask from previous modules above may be considered to be a higherconfidence mask since it is more robust than the hair mask which isharder to estimate for complex color variation and different hairstyles. This sub-process may include the skin mask to the refined maskafter least square fitting step. The hair contour may be re-evaluatedbased on the current head mask. Pixels inside the curve could be addedinto the hair mask, and pixels outside the curve could be removed fromhair mask to background. In addition, if the fitting score is less thanthe fitting score returned at 607 of FIG. 6, all the missing hair pixelscould be added back to the current segmentation result.

At 801 of FIG. 13, a determination of whether the head was bold isperformed. If the person has a bald head, the initial hair estimationmay have included some junk pixels above the head area. A round headclassifier may be utilized to detect this case. The premise is that thebald head is close to be round, and the contour of the bald head can bewell approximated by a round circle with unknown radius. The detectionprocess picks up all the head contour points after the least squarefitting process, and performs a least square fitting to a round circle.The final decision of round head may be determined by thresholding thefitting score. For example, the threshold may be set to 1.1 bycross-validation over a set of training images. Any pixel outside thehead contour will not be included in the refined mask if the head isdetected to be round. If the final decision is not within the threshold,at 802, face skin regions and hair regions inside the head contour areincluded.

At 803 of FIG. 13, a random walk operation may be applied to refine theboundary of the current segmentation result. A random walk process isdisclosed in “Random Walks For Image Segmentation”, Grady L., IEEETransactions on Pattern Analysis and Machine Intelligence (PAMI), pp.1768-1783, vol. 28, November 2006, which is hereby incorporated byreference in its entirety. A revised version has been implemented andintegrated in the current segmentation framework. This process firsttries to construct the Laplacian matrix based on the closed-formaffinity, as disclosed in “A Closed-Form Solution to Natural ImageMatting,” IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI), Vol. 30, No. 2, February, 2008, which is hereby incorporated byreference in its entirety.

Based on the initial segmentation results, a set of boundary pixelsbetween the background and head mask may be selected. The labels ofthose boundary pixels are re-evaluated by solving a sparse linearsystem. The revised version includes a regularization term of shapeprior, and allows the unknown pixels to be specified and removed fromthe background.

At 804, a convex hull sub-process is performed. After the globalrefinement in the previous sub-process, the refined segmentation resultsmay have some dents in the top of the hair area because the Least SquareFitting may not have preserved the boundary of the hair contour. Theheuristic that the top of the hair forms a convex contour shape can beused to fill those dents. A convex hull sub-process may be applied tofind all of the small dents in the hair boundary. All the pixels insidethe region of dents may be selected, and optionally, the random walkoperation may again be performed to resolve the labels of those pixelsand update the head mask.

At 805, a determination of whether the refined mask's re-constructionerror of Principle Component Analysis (hereinafter “PCA”) projection iswithin a threshold is performed. The PCA subspace may be trained by aset of face images registered by ASM control points described above. Thecurrent refined face mask will be replaced at 806 by the re-constructedmask if the re-construction error is larger than a threshold. Ifreplacement is chosen, the mask may optionally be re-sub-processed bythe random walk and/or the convex hull operations, and again reevaluatedby the determination at 805.

At 807, hair and skin area resolution operation may optionally occur.The refined head mask after Least Square Fitting and random walkoperations may include pixels from the initial background mask. In thissub-process, those pixels will be assigned to either hair or face maskby a Gaussian Mixture Model in the RGB color space and by a random walkoperation.

At 808, areas outside the chin may optionally be cleaned up. Forexample, the neck areas from the current refined face mask may need tobe removed. This sub-process may select all the pixels outside the chincontour specified by ASM control points. In one embodiment, the pixelsthat appear in the hair mask will be kept. The pixels outside the chincontour may be treated as neck, hand, or body areas of the person andremoved from the current segmentation result. The random walksub-process may again be applied to refine the chin boundary of thefinal segmentation output. At 809, the final segmentation result isoutputted, optionally scaled accordingly.

As desired, the methods disclosed herein may be executable on aconventional general-purpose computer (or microprocessor) system.Additionally, or alternatively, the methods disclosed herein may bestored on a conventional storage medium for subsequent execution via thegeneral-purpose computer. FIG. 14 is an illustration of an exemplaryembodiment of an architecture 1000 of a computer system suitable forexecuting the methods disclosed herein. Computer architecture 1000 isused to implement the computer systems or image processing systemsdescribed in various embodiments of the method for segmentation. Asshown in FIG. 14, the architecture 1000 comprises a system bus 1020 forcommunicating information, and a processor 1010 coupled to bus 1020 forprocessing information. Architecture 1000 further comprises a randomaccess memory (RAM) or other dynamic storage device 1025 (referred toherein as main memory), coupled to bus 1020 for storing information andinstructions to be executed by processor 1010. Main memory 1025 is usedto store temporary variables or other intermediate information duringexecution of instructions by processor 1010. Architecture 1000 includesa read only memory (ROM) and/or other static storage device 1026 coupledto bus 1020 for storing static information and instructions used byprocessor 1010. Although the architecture 1000 is shown and described ashaving selected system elements for purposes of illustration only, itwill be appreciated that the method for refinement of segmentation usingspray paint markup can be executed by any conventional type of computerarchitecture without limitation.

A data storage device 1027 such as a magnetic disk or optical disk andits corresponding drive is coupled to computer system 1000 for storinginformation and instructions. The data storage device 1027, for example,can comprise the storage medium for storing the method for segmentationfor subsequent execution by the processor 1010. Although the datastorage device 1027 is described as being magnetic disk or optical diskfor purposes of illustration only, the methods disclosed herein can bestored on any conventional type of storage media without limitation.

Architecture 1000 is coupled to a second I/O bus 1050 via an I/Ointerface 1030. A plurality of I/O devices may be coupled to I/O bus1050, including a display device 1043, an input device (e.g., analphanumeric input device 1042 and/or a cursor control device 1041).

The communication device 1040 is for accessing other computers (serversor clients) via a network. The communication device 1040 may comprise amodem, a network interface card, a wireless network interface, or otherwell known interface device, such as those used for coupling toEthernet, token ring, or other types of networks.

Foregoing described embodiments are provided as illustrations anddescriptions. They are not intended to limit the embodiments to preciseform described. In particular, it is contemplated that functionalimplementation of the embodiments described herein may be implementedequivalently in hardware, software, firmware, and/or other availablefunctional components or building blocks, and that networks may bewired, wireless, or a combination of wired and wireless. Othervariations and embodiments are possible in light of above teachings, andit is thus intended that the scope of invention not be limited by thisdetailed description, but rather by the claims following.

1-3. (canceled)
 4. A method comprising: generating a generic facial maskassociated with a facial feature of a person, wherein the generic facialmask is generated based on a first plurality of training images, whereinthe generic facial mask is configured to identify the facial feature ofa particular person; generating a generic hair mask associated with ahair feature of a person, wherein the generic hair mask is generatedbased on a second plurality of training images, wherein the generic hairmask is configured to identify the hair feature of the particularperson; and storing the generic hair and facial masks.
 5. The method ofclaim 4, wherein the generic facial mask comprises a first plurality ofcontrol data points, wherein the first plurality of control data pointsis based on a first plurality of training data point sets, wherein eachtraining data point set of the first plurality of training data pointsets is associated with a facial feature of a training image of thefirst plurality of training images, and wherein the generic hair maskcomprises a second plurality of control data points, wherein the secondplurality of control data points is based on a second plurality oftraining data point sets, wherein each training data point set of thesecond plurality of training data point sets is associated with a hairfeature of a training image of the second plurality of training images.6. The method of claim 4, wherein the generic facial mask comprises afirst plurality of control data points, and wherein the generating ofthe generic facial mask comprises averaging training data point setsassociated with the first plurality of training images.
 7. The method ofclaim 4, wherein the generic hair mask comprises a second plurality ofcontrol data points, and wherein the generating of the generic hair maskcomprises averaging training data point sets associated with the secondplurality of training images.
 8. The method of claim 4, wherein thegeneric facial mask is a mask associated with a chin of a person.
 9. Themethod of claim 4 further comprising: generating a skin model associatedwith a skin color of a person based on skin color intensity of a thirdplurality of training images; and storing the skin model.
 10. A methodcomprising: receiving a desired image; receiving a generic facial maskassociated with a facial feature of a person from a memory component,wherein the generic facial mask is based on a first plurality oftraining images, and wherein the generic facial mask is configured toidentify a facial feature of a person in an image; receiving a generichair mask associated with a hair feature of a person from the memorycomponent, wherein the generic hair mask is based on a second pluralityof training images, and wherein the generic hair mask is configured toidentify a hair feature of a person in an image; applying the genericfacial mask and the generic hair mask to the desired image; andidentifying facial feature and hair feature in the desired image. 11.The method of claim 10, wherein the generic facial mask comprises afirst plurality of control data points, wherein the first plurality ofcontrol data points is based on a first plurality of training data pointsets, wherein each training data point set of the first plurality oftraining data point sets is associated with a facial feature of atraining image of the first plurality of training images; and whereinthe generic hair mask comprises a second plurality of control datapoints, wherein the second plurality of control data points is based ona second plurality of training data point sets, wherein each trainingdata point set of the second plurality of training data point sets isassociated with a hair feature of a training image of the secondplurality of training images.
 12. The method of claim 11, whereinapplying the generic facial mask and the generic hair mask comprisesiteratively applying the first plurality of control data points to thedesired image to identify the facial feature in the desired image anditeratively applying the second plurality of control data points to thedesired image to identify the hair feature in the desired image.
 13. Themethod of claim 11, wherein the identified facial feature issubstantially a match of the first plurality of control data points inthe desired image, and wherein the identified hair feature issubstantially a match of the second plurality of control data points inthe desired image.
 14. The method of claim 10 further comprising:refining the identified hair feature and the identified facial featurein the desired image; and forming a head mask of a person in the desiredimage based on the refined hair and facial features.
 15. The method ofclaim 10 further comprising: dividing the identified hair feature into aplurality of hair regions, wherein each hair region of the plurality ofhair regions has a hair color associated therewith; classifying eachhair region of the plurality of hair regions by a hair color,determining an overall hair color associated with the identified hairfeature based on a majority of the hair classification; identifying aportion in the desired image that includes the overall hair color; andrefining the identified hair feature by including the identified portioninto the identified hair feature.
 16. The method of claim 10 furthercomprising: applying a plurality of hair contour data points to thedesired image to identify an outer hair boundary between the identifiedhair feature and a background portion of the desired image; identifyingthe outer hair boundary in response the outer hair boundarysubstantially matching the plurality of hair contour data points;identifying uneven portions at the outer hair boundary; smoothing outthe uneven portions to form a new outer hair boundary that issubstantially convex contour shaped; and updating the identified hairfeatures to include the new outer hair boundary.
 17. The method of claim10 further comprising: determining an illumination level based on anaverage illumination level of the identified facial feature; determiningwhether the illumination level exceeds an illumination level threshold;and enhancing image quality of the identified facial feature in responseto the illumination level exceeding the illumination level threshold.18. The method of claim 10, wherein the generic facial mask isassociated with a chin feature of a person, and wherein the methodfurther comprises: identifying a chin contour in the desired image inresponse to a match resulting from application of the generic facialmask to the desired image; determining whether the identified facialfeature includes portions outside of the chin contour; refining theidentified facial feature by removing the determined portions from theidentified facial feature in response to determining that the identifiedfacial feature includes portions outside of the chin contour, andresponsive to the refining, smoothing uneven portions along the chincontour to form a new chin contour, wherein the new chin contour issubstantially convex shaped.
 19. The method of claim 10, whereinapplying the generic hair mask includes applying a bald head template tothe desired image, and wherein the method further comprises: determiningthat a person in the desired image has a bald head if a match betweenthe facial feature and the bald head template is within a predeterminedthreshold; and responsive to determining that a person in the desiredimage is bald, refining the identified hair features by removingportions from the identified hair features that are outside of aboundary represented by the bald head template.
 20. A method comprising:generating a generic feature mask associated with a facial feature of aperson that is substantially invariant from one person to anotherperson, wherein the generic feature mask is based on a plurality oftraining images; and storing the generic feature mask.
 21. The method ofclaim 20, wherein the generic feature mask comprises a plurality ofcontrol data points, wherein the plurality of control data points isbased on a plurality of training data point sets, wherein each trainingdata point set of the plurality of training data point sets isassociated with a facial feature of a training image of the plurality oftraining images.
 22. The method of claim 20, wherein the generic featuremask comprises a plurality of control data points, and wherein thegenerating the generic feature mask comprises averaging training datapoint sets associated with the plurality of training images.
 23. Themethod of claim 20, wherein the facial feature of the person is selectedfrom a group consisting of a face, a nose, eyes, a mouth, and a chin.